Word Searching in Document Images Using Word Portion Matching
نویسندگان
چکیده
An approach with the capability of searching a word portion in document images is proposed in this paper, to facilitate the detection and location of the user-specified query words. A feature string is synthesized according to the character sequence in the user-specified word, and each word image extracted from documents are represented by a feature string. Then, an inexact string matching technology is utilized to measure the similarity between the two feature strings, based on which we can estimate how the document word image is relevant to the user-specified word and decide whether its portion is the same as the user-specified word. Experimental results on real document images show that it is a promising approach, which is capable of detecting and locating the document words that entirely match or partially match with the user-specified word.
منابع مشابه
Searching in Document Images
Searching in scanned documents is an important problem in Digital Libraries. If OCRs are not available, the scanned images are inaccessible. In this paper, we demonstrate a searching procedure without an intermediate textual representation. We achieve effective retrieval from document databases by matching at word-level using image features. Word profiles, structural features and transform doma...
متن کاملKeyword Spotting on Hangul Document Images Using Two-Level Image-to-Image Matching
A lot of printed documents and books has been published and saved as a form of images in digital libraries. Searching for a specified query word on document images is a challenging problem. The OCR software helps the images to be converted to the machine readable documents to search a full context [1]. Another approach [1, 2] is image-based one, in which both the document images and word inform...
متن کاملWord Searching in CCITT Group 4 Compressed Document Images
In this paper, we present a compressed pattern matching method for searching user queried words in the CCITT Group 4 compressed document images, without decompressing. The feature pixels composed of black changing elements and white changing elements are extracted directly from the CCITT Group 4 compressed document images. The connected components are labeled based on a line-by-line strategy ac...
متن کاملWord Spotting in Chinese Document Images without Layout Analysis
An approach to searching user-specified words/phrases in Chinese document images, without the requirements of layout analysis, is proposed in this paper. Bounding boxes of Chinese character images are first determined using connected component analysis. Next, a suitable character from the user-specified word/phrase is chosen as the initial character to search for a matching candidate in the doc...
متن کاملAn Intelligent System for Exact Word Retrieval in Document Databases
Automatic Information retrieval from document image databases is an important and challenging task. The main challenges are font style, size and spacing between characters. In order to meet the challenges, we propose a new technique for matching exact word string from document databases. For this approach, we address two issues: word identification and similarity measurement between documents. ...
متن کامل